Red Line

Coding challenge for Climate Farmers

1 Region Selection and Data Aqcuisition

1.1 Region Selection and Basemap Preparation

I decided to work with Portugal, because it includes various types of land cover, and has variability in soil organic carbon and climate. I imported the country shape from the gadm database GADM database. As I want to work with mainland Portugal, I excluded Madeira and the Azores.

Shape of mainland Portugal used to restrict the analysis of soil, land cover and climate data.

Shape of mainland Portugal used to restrict the analysis of soil, land cover and climate data.

1.2 Data Aqcuisition

I downloaded the following datasets for the analysis:
  1. Climate (years 2020-2022):
    • Temperature (unit: Kelvin):
      Air temperature 2 meters above land or water surfaces.
    • Evapotranspiration (unit: Meters):
      Accumulated amount of water that has evaporated from the Earth’s surface, including a simplified representation of transpiration from vegetation.
    • Precipitation (unit: Meters):
      Accumulated liquid and frozen water (rain, snow) falling to Earth’s surface, excluding fog, dew, and evaporated precipitation. The depth the water would have if it were spread evenly over the grid box.
  2. land cover (year 2020):
    • Global gridded land cover classification:
      Land cover class per pixel, with 22 classes globally, defined using the Land Cover Classification System developed by the United Nations Food and Agriculture Organization.
  3. Soil organic carbon (SOC):
    • Soil organic carbon (unit: t/ha):
      Soil organic carbon content in 0-30 cm depth.
Each layer was clipped with the shape of mainland Portugal for the analysis.

1.2.1 Temperature

Temperature (°C) in Portugal for January 2020.

Temperature (°C) in Portugal for January 2020.

1.2.2 Evapotranspiration

Evapotranspiration (m) in Portugal for January 2020.

Evapotranspiration (m) in Portugal for January 2020.

1.2.3 Precipitation

Precipitation (m) in Portugal for January 2020.

Precipitation (m) in Portugal for January 2020.

1.2.4 Land Cover

<span class='caption'>Land cover classes in Portugal in 2020.</span>

Land cover classes in Portugal in 2020.

The label and color information was extracted from the metadata of the land cover layer. However, I decided to change the color for the three cropland rainfed classes slightly, so they could be distinguished better.

1.2.5 Soil

## SpatRaster resampled to ncells = 500760
Soil organic carbon content (t/ha) in Portugal.

Soil organic carbon content (t/ha) in Portugal.

2 Data integration

2.1 Investigating spatial dimensions

I compared the resolution and extent of all layers, to allow for joint analysis of the layers.
Layer Name Resolution (lon) Resolution (lat) Origin (lon) Origin (lat)
Climate variables 0.1 0.1 -9.58 36.97
Land Cover 0.0028 0.0028 -9.5472 36.9611
Soil Organic Carbon 0.0023 0.0024 -9.5468 36.9609

The climate variables (which share the same spatial dimensions) are at a lower resolution than the land cover and SOC layer and also have different origins. Therefore I resampled the latter to the same resolution, origin and extent as the climate variables.

2.2 Data resampling

2.2.1 Land cover data integration

Land cover is a categorical variable. I explored two options to resample this data. First using the method “nearest neighbor”, which is typically used for categorical variables, as it assigns new pixel values by selecting the nearest original pixel value without any interpolation, effectively copying the closest value to the new pixel location. However, when resampling to a lower resolution for a land cover layer, it might be more interesting using the class which most high resolution pixels have within the lower resolution. I tested both options using the code below and compared the maps visually in terms of pattern.

# resample option 1 using nearest neighbor
landcover_pt_near <- terra::resample(landcover_pt, resample_raster, method = "near")

# resample option 2 using majority
landcover_pt_majority <- 
  exactextractr::exact_resample(landcover_pt, 
                                resample_raster, 
                                'majority')
<span class='caption'>Land cover class patterns in Portugal in 2020 resampled to climate data with two different methods.</span>

Land cover class patterns in Portugal in 2020 resampled to climate data with two different methods.

When resampling with method “near” the landcover classes are very patchy and fragmented. The layer resampled with majority has more continuous representation of classes and the logic of that resampling method is more sound, so I am using that layer for the analysis.

2.2.2 Soil organic carbon data integration

Soil organic carbon (SOC) is a continous variable. I want to work with the mean SOC per lower resolution cell. Typically continuous rasters are resampled using method bilinear, which calculates values of a grid location based on nearby grid cells, using a weighted average of the four nearest cell centers. I tested this option, as well as using a function to calculate the mean within the exact_resample() function. This function aggregates cells before resampling, so that the average is not based on four grid-cells but the grid cells covered by the lower resolution cell. I compared the maps visually in terms of pattern.

# resample option 1
SOC_pt_bil <- resample(landcover_pt, resample_raster, method = "bilinear")

# resample option 2
SOC_pt_mean <- exactextractr::exact_resample(SOC_pt, 
                                             resample_raster, 
                                             'mean')
<span class='caption'>Soil organic carbon layer in Portugal resampled with two different methods.</span>

Soil organic carbon layer in Portugal resampled with two different methods.

When resampling with method ´exact_resample´ and function “mean” the pattern of low SOC values along the cost and high values in the North of Portugal is maintained, therefore I am going to use that layer for the analysis.

3 Analysis and Visualization

I checked whether the dimensions for all layers matched before proceeding with the analysis using the ´compareGeom´ function.

compareGeom(evapotransp_pt, precipitation_pt, temperature_pt,
            landcover_pt, SOC_pt)

3.1 Exploring land cover distribution

I analysed climate and soil organic carbon within the land cover classes in Portugal and over time. First I explore the share of each land cover class within the country.

<span class='caption'>Land cover classes of Portugal and their proportions. All classes with bars below the dashed line were excluded for further analysis, as well as water and NA.</span>

Land cover classes of Portugal and their proportions. All classes with bars below the dashed line were excluded for further analysis, as well as water and NA.

For the next steps I excluded the land cover classes with less than 1% of overall pixels (i.e. 10 pixels or less) represented with the dashed line in the plot, as well as pixels that had land cover “water” or “NA”, which were 2.5% and 1% respectively of all pixels.

3.2 Exploring climate variables over time

I examined climate variables across various land cover classes across a time span. Depending on the specific message required, the plots can emphasize different aspects. Initially, I analyzed the changes in mean values across the months spanning from 2020 to 2022, encompassing all land cover classes collectively. This approach offers a clear visualization to highlight variations in behavior among different land cover classes.

3.2.1 Temperature

<span class='caption'>Mean temperature in different land cover classes in Portugal from 2020 to 2022.</span>

Mean temperature in different land cover classes in Portugal from 2020 to 2022.

3.2.2 Precipitation

<span class='caption'>Mean precipitation in different land cover classes in Portugal from 2020 to 2022.</span>

Mean precipitation in different land cover classes in Portugal from 2020 to 2022.

3.2.3 Evapotranspiration

<span class='caption'>Mean evapotranspiration in different land cover classes in Portugal from 2020 to 2022.</span>

Mean evapotranspiration in different land cover classes in Portugal from 2020 to 2022.

3.3 Exploring mean temperature for each land cover class (example)

Depending on what the focus of the visualization should be, I could also look at each land cover class separately. This visualization is better to compare overall differences in the individual patterns and allows to plot the standard deviation as errors around each line, which isn’t very visible in the combined plot. I am showing temperature here as an example.

<span class='caption'>Mean temperature in different land cover classes in Portugal from 2020 to 2022.</span>

Mean temperature in different land cover classes in Portugal from 2020 to 2022.

3.4 Exploring soil organic carbon within land cover classes

I can visualize mean soil organic carbon per landcover class. As I only have one time point for this layer I used a bar plot for visualisation.

<span class='caption'>Average soil organic carbon (t/ha) per land cover classes of Portugal.</span>

Average soil organic carbon (t/ha) per land cover classes of Portugal.

Error bars show the standard deviation around the mean per class.

4 Designing a Sampling Schemes for Soil Organic Carbon

4.1 Simple scheme for sampling

I calculated the necessary sample size following:

\[ n = \left(\frac{z \times \sigma}{E}\right)^2 \]

to detect changes in SOC if I want a 95% confidence interval equal to or less than 10% of the mean value, assuming a Gaussian distribution.

The formula calculates the required sample size (n) needed to estimate a population mean within a desired margin of error (E) at a specified confidence level. It considers the variability of the population (σ) and the critical value from the standard normal distribution (z).

The sample size can thus be calculated with the following code:

# calculate variables based on input and SOC
confidence_level <- 0.95
z_score <- qnorm(1 - (1 - confidence_level) / 2)
standard_deviation <- sd(sampling_env$SOC_df$SOC) 
desired_width <- 0.1 * mean(sampling_env$SOC_df$SOC)

# Calculate the number of samples required
nr_samples <- (z_score * standard_deviation / desired_width) ^ 2

Working with the landcover at a ~1°̇ resolution, the necessary sample size is thus 28. This calculated value represents the minimum sample size needed for the analysis.

Working with the landcover at the original 250 m resolution and thus a higher variability in the data, the necessary sample size would be 40.

4.2 Visualising a random sampling in space

If we wanted to sample these points in space we could use the spatSample() function to suggest random coordinates within Portugal.

Random sampling scheme for the soil organic carbon content (t/ha) in Portugal for the lower resolution of the soil layer.

Random sampling scheme for the soil organic carbon content (t/ha) in Portugal for the lower resolution of the soil layer.

4.3 Visualising stratified sampling in space

We can also design a sampling scheme for soil organic carbon content that is stratified for land cover classes. In this case we calculate the necessary sample size for the SOC values of all pixels with that landcover class.

Stratified sampling scheme for the soil organic carbon content (t/ha) within land cover classes in Portugal.

Stratified sampling scheme for the soil organic carbon content (t/ha) within land cover classes in Portugal.

In the plot the the sample size for each class is given behind each class. Classes with 0 samples cannot be sampled with the necessary confidence as they aren’t represented well enough in our dataset.

5 Soil model

I implemented a very simple soil model using the package ‘soilR’ and RothC. The setup assumes that the only information available are the percent clay content in the topsoil, which I extracted for a point within Portugal from the SoilGrids database, an assumed annual amount of litter inputs, and monthly averages of climatic variables for that same point. The model is run 300 years into the future.

Output of simple soil model using RothC.

Output of simple soil model using RothC.

The final pool sizes of Dissolved and Particulate Matter (DPM), Resistant Particulate Matter (RPM), Biomass (BIO), Humus (HUM), and Inert Organic Matter (IOM) for this point in Portugal with assumed parameters are then:

DPM RPM BIO HUM IOM
0.1477609 2.1369862 0.2786526 11.4037173 5.4357393

This simple model could be further expanded and tested for other areas.

The End